Kernel Methods for Tree Structured Data
نویسنده
چکیده
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data.
منابع مشابه
A preliminary empirical comparison of recursive neural networks and tree kernel methods on regression tasks for tree structured domains
The aim of this paper is to start a comparison between Recursive Neural Networks (RecNN) and kernel methods for structured data, specifically Support Vector Regression (SVR) machine using a Tree Kernel, in the context of regression tasks for trees. Both the approaches can deal directly with a structured input representation and differ in the construction of the feature space from structured dat...
متن کاملTree Kernel Usage in Naive Bayes Classifiers
We present a novel approach in machine learning by combining naı̈ve Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines o...
متن کاملTree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information
This paper proposes a tree kernel with contextsensitive structured parse tree information for relation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a dynamic context-sensitive tree span for relation extraction by extending the widely-used Shortest Path-enclosed Tree (SPT) to include necessary conte...
متن کاملTree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering
Structured syntactic knowledge is important for phrase reordering. This paper proposes using convolution tree kernel over source parse tree to model structured syntactic knowledge for BTG-based phrase reordering in the context of statistical machine translation. Our study reveals that the structured syntactic features over the source phrases are very effective for BTG constraint-based phrase re...
متن کاملExploring syntactic structured features over parse trees for relation extraction using kernel methods
Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effe...
متن کامل